52 research outputs found

    Toward Generalizable Machine Learning Models in Speech, Language, and Hearing Sciences: Power Analysis and Sample Size Estimation

    Full text link
    This study's first purpose is to provide quantitative evidence that would incentivize researchers to instead use the more robust method of nested cross-validation. The second purpose is to present methods and MATLAB codes for doing power analysis for ML-based analysis during the design of a study. Monte Carlo simulations were used to quantify the interactions between the employed cross-validation method, the discriminative power of features, the dimensionality of the feature space, and the dimensionality of the model. Four different cross-validations (single holdout, 10-fold, train-validation-test, and nested 10-fold) were compared based on the statistical power and statistical confidence of the ML models. Distributions of the null and alternative hypotheses were used to determine the minimum required sample size for obtaining a statistically significant outcome ({\alpha}=0.05, 1-\b{eta}=0.8). Statistical confidence of the model was defined as the probability of correct features being selected and hence being included in the final model. Our analysis showed that the model generated based on the single holdout method had very low statistical power and statistical confidence and that it significantly overestimated the accuracy. Conversely, the nested 10-fold cross-validation resulted in the highest statistical confidence and the highest statistical power, while providing an unbiased estimate of the accuracy. The required sample size with a single holdout could be 50% higher than what would be needed if nested cross-validation were used. Confidence in the model based on nested cross-validation was as much as four times higher than the confidence in the single holdout-based model. A computational model, MATLAB codes, and lookup tables are provided to assist researchers with estimating the sample size during the design of their future studies.Comment: Under review at JSLH

    Statistical properties of linear prediction analysis underlying the challenge of formant bandwidth estimation

    Get PDF
    Formant bandwidth estimation is often observed to be more challenging than the estimation of formant center frequencies due to the presence of multiple glottal pulses within a period and short closed-phase durations. This study explores inherently different statistical properties between linear prediction (LP)–based estimates of formant frequencies and their corresponding bandwidths that may be explained in part by the statistical bounds on the variances of estimated LP coefficients. A theoretical analysis of the Cramér-Rao bounds on LP estimator variance indicates that the accuracy of bandwidth estimation is approximately twice as low as that of center frequency estimation. Monte Carlo simulations of all-pole vowels with stochastic and mixed-source excitation demonstrate that the distributions of estimated LP coefficients exhibit expectedly different variances for each coefficient. Transforming the LP coefficients to formant parameters results in variances of bandwidth estimates being typically larger than the variances of respective center frequency estimates, depending on vowel type and fundamental frequency. These results provide additional evidence underlying the challenge of formant bandwidth estimation due to inherent statistical properties of LP-based speech analysi

    Direct measurement and modeling of intraglottal, subglottal, and vocal fold collision pressures during phonation in an individual with a hemilaryngectomy

    Get PDF
    The purpose of this paper is to report on the first in vivo application of a recently developed transoral, dual-sensor pressure probe that directly measures intraglottal, subglottal, and vocal fold collision pressures during phonation. Synchronous measurement of intraglottal and subglottal pressures was accomplished using two miniature pressure sensors mounted on the end of the probe and inserted transorally in a 78-year-old male who had previously undergone surgical removal of his right vocal fold for treatment of laryngeal cancer. The endoscopist used one hand to position the custom probe against the surgically medialized scar band that replaced the right vocal fold and used the other hand to position a transoral endoscope to record laryngeal high-speed videoendoscopy of the vibrating left vocal fold contacting the pressure probe. Visualization of the larynx during sustained phonation allowed the endoscopist to place the dual-sensor pressure probe such that the proximal sensor was positioned intraglottally and the distal sensor subglottally. The proximal pressure sensor was verified to be in the strike zone of vocal fold collision during phonation when the intraglottal pressure signal exhibited three characteristics: an impulsive peak at the start of the closed phase, a rounded peak during the open phase, and a minimum value around zero immediately preceding the impulsive peak of the subsequent phonatory cycle. Numerical voice production modeling was applied to validate model-based predictions of vocal fold collision pressure using kinematic vocal fold measures. The results successfully demonstrated feasibility of in vivo measurement of vocal fold collision pressure in an individual with a hemilaryngectomy, motivating ongoing data collection that is designed to aid in the development of vocal dose measures that incorporate vocal fold impact collision and stresses.Fil: Mehta, Daryush D.. Massachusetts General Hospital; Estados UnidosFil: Kobler, James B.. Massachusetts General Hospital; Estados UnidosFil: Zeitels, Steven M.. Harvard Medical School. Department of Medicine. Massachusetts General Hospital; Estados UnidosFil: Zañartu, Matías. Universidad Técnica Federico Santa María; ChileFil: Ibarra, Emiro J.. Universidad Técnica Federico Santa María; ChileFil: Alzamendi, Gabriel Alejandro. Universidad Nacional de Entre Ríos. Instituto de Investigación y Desarrollo en Bioingeniería y Bioinformática - Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Santa Fe. Instituto de Investigación y Desarrollo en Bioingeniería y Bioinformática; ArgentinaFil: Manriquez, Rodrigo. Universidad Técnica Federico Santa María; ChileFil: Erath, Byron D.. Clarkson University; Estados UnidosFil: Peterson, Sean D.. University of Waterloo; CanadáFil: Petrillo, Robert H.. Center For Laryngeal Surgery and Voice Rehabilitation; Estados UnidosFil: Hillman, Robert E.. Center For Laryngeal Surgery and Voice Rehabilitation; Estados Unidos. Harvard Medical School. Department of Medicine. Massachusetts General Hospital; Estados Unido

    Estimation of Subglottal Pressure, Vocal Fold Collision Pressure, and Intrinsic Laryngeal Muscle Activation From Neck-Surface Vibration Using a Neural Network Framework and a Voice Production Model

    Get PDF
    The ambulatory assessment of vocal function can be significantly enhanced by having access to physiologically based features that describe underlying pathophysiological mechanisms in individuals with voice disorders. This type of enhancement can improve methods for the prevention, diagnosis, and treatment of behaviorally based voice disorders. Unfortunately, the direct measurement of important vocal features such as subglottal pressure, vocal fold collision pressure, and laryngeal muscle activation is impractical in laboratory and ambulatory settings. In this study, we introduce a method to estimate these features during phonation from a neck-surface vibration signal through a framework that integrates a physiologically relevant model of voice production and machine learning tools. The signal from a neck-surface accelerometer is first processed using subglottal impedance-based inverse filtering to yield an estimate of the unsteady glottal airflow. Seven aerodynamic and acoustic features are extracted from the neck surface accelerometer and an optional microphone signal. A neural network architecture is selected to provide a mapping between the seven input features and subglottal pressure, vocal fold collision pressure, and cricothyroid and thyroarytenoid muscle activation. This non-linear mapping is trained solely with 13,000 Monte Carlo simulations of a voice production model that utilizes a symmetric triangular body-cover model of the vocal folds. The performance of the method was compared against laboratory data from synchronous recordings of oral airflow, intraoral pressure, microphone, and neck-surface vibration in 79 vocally healthy female participants uttering consecutive /pæ/ syllable strings at comfortable, loud, and soft levels. The mean absolute error and root-mean-square error for estimating the mean subglottal pressure were 191 Pa (1.95 cm H2O) and 243 Pa (2.48 cm H2O), respectively, which are comparable with previous studies but with the key advantage of not requiring subject-specific training and yielding more output measures. The validation of vocal fold collision pressure and laryngeal muscle activation was performed with synthetic values as reference. These initial results provide valuable insight for further vocal fold model refinement and constitute a proof of concept that the proposed machine learning method is a feasible option for providing physiologically relevant measures for laboratory and ambulatory assessment of vocal function.Fil: Ibarra, Emiro J.. Universidad Tecnica Federico Santa Maria.; ChileFil: Parra, Jesús A.. Universidad Tecnica Federico Santa Maria.; ChileFil: Alzamendi, Gabriel Alejandro. Universidad Nacional de Entre Ríos. Instituto de Investigación y Desarrollo en Bioingeniería y Bioinformática - Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Santa Fe. Instituto de Investigación y Desarrollo en Bioingeniería y Bioinformática; ArgentinaFil: Cortés, Juan P.. Universidad Tecnica Federico Santa Maria.; ChileFil: Espinoza, Víctor M.. Universidad de Chile; ChileFil: Mehta, Daryush D.. Center For Laryngeal Surgery And Voice Rehabilitation; Estados UnidosFil: Hillman, Robert E.. Center For Laryngeal Surgery And Voice Rehabilitation; Estados UnidosFil: Zañartu, Matías. Universidad Tecnica Federico Santa Maria.; Chil

    Using Ambulatory Voice Monitoring to Investigate Common Voice Disorders: Research Update

    Get PDF
    Many common voice disorders are chronic or recurring conditions that are likely to result from inefficient and/or abusive patterns of vocal behavior, referred to as vocal hyperfunction. The clinical management of hyperfunctional voice disorders would be greatly enhanced by the ability to monitor and quantify detrimental vocal behaviors during an individual’s activities of daily life. This paper provides an update on ongoing work that uses a miniature accelerometer on the neck surface below the larynx to collect a large set of ambulatory data on patients with hyperfunctional voice disorders (before and after treatment) and matched-control subjects. Three types of analysis approaches are being employed in an effort to identify the best set of measures for differentiating among hyperfunctional and normal patterns of vocal behavior: (1) ambulatory measures of voice use that include vocal dose and voice quality correlates, (2) aerodynamic measures based on glottal airflow estimates extracted from the accelerometer signal using subject-specific vocal system models, and (3) classification based on machine learning and pattern recognition approaches that have been used successfully in analyzing long-term recordings of other physiological signals. Preliminary results demonstrate the potential for ambulatory voice monitoring to improve the diagnosis and treatment of common hyperfunctional voice disorders

    Method for Horizontal Calibration of Laser-Projection Transnasal Fiberoptic High-Speed Videoendoscopy

    No full text
    Objective: Calibrated horizontal measurements (e.g., mm) from endoscopic procedures could be utilized for advancement of evidence-based practice and personalized medicine. However, the size of an object in endoscopic images is not readily calibrated and depends on multiple factors, including the distance between the endoscope and the target surface. Additionally, acquired images may have significant non-linear distortion that would further complicate calibrated measurements. This study used a recently developed in vivo laser-projection fiberoptic laryngoscope and proposes a method for calibrated spatial measurements. Method: A set of circular grids was recorded at multiple working distances. A statistical model was trained that would map from pixel length of the object, the working distance, and the spatial location of the target object into its mm length. Result: A detailed analysis of the performance of the proposed method is presented. The analyses have shown that the accuracy of the proposed method does not depend on the working distance and length of the target object. The estimated average magnitude of error was 0.27 mm, which is three times lower than the existing alternative. Conclusion: The presented method can achieve sub-millimeter accuracy in horizontal measurement. Significance: Evidence-based practice and personalized medicine could significantly benefit from the proposed method. Implications of the findings for other endoscopic procedures are also discussed
    • …
    corecore